fix(server): avoid extracting text range filters by LegendPei · Pull Request #3034 · apache/hugegraph

LegendPei · 2026-05-21T02:20:36Z

Purpose of the PR

close [Bug] Unexpected ArrayIndexOutOfBoundsException for filter-step and repeat()-step #2935

This PR fixes the query-planning path exposed by #2935.

For top-level traversals like g.V().has("vp4", P.lt("")).repeat(...).count(),
HugeGraph may extract the text range predicate into a backend query. However,
range predicates on text properties should not be planned as backend range
index queries. This can make the direct traversal fail while an equivalent
match() traversal returns normally.

This change keeps such text range predicates in the traversal layer instead of
pushing them down to the backend query, making the direct traversal consistent
with the equivalent match() form.

Main Changes

Avoid extracting Text property predicates with gt/gte/lt/lte into
HugeGraphStep / HugeVertexStep backend queries.
Preserve normal extraction for system properties and non-text property
predicates.
Add a regression test for the reported traversal shape:
has("vp4", P.lt("")).repeat(__.out("el2")).emit().times(1).count().
Verify that the direct traversal returns the same result as the equivalent
match() traversal.

Verifying these changes

Trivial rework / code cleanup without any test coverage. (No Need)
Already covered by existing tests, such as (please modify tests here).
Need tests and can be verified as follows:
- mvn -pl hugegraph-server/hugegraph-test -am -P core-test,memory -Dtest=CountStrategyCoreTest#testRepeatAfterTextRangeFilterWithEmptyResult -DfailIfNoTests=false "-Drat.skip=true" "-Dcheckstyle.skip=true" test
- mvn -pl hugegraph-server/hugegraph-test -am -P core-test,rocksdb -Dtest=CountStrategyCoreTest#testRepeatAfterTextRangeFilterWithEmptyResult -DfailIfNoTests=false "-Drat.skip=true" "-Dcheckstyle.skip=true" test

Does this PR potentially affect the following parts?

Dependencies (add/update license info & regenerate_known_dependencies.sh)
Modify configurations
The public API
Other affects (typed here)
Nope

Documentation Status

Doc - TODO
Doc - Done
Doc - No Need

Copilot

Pull request overview

Fixes a traversal query-planning bug where range predicates (gt/gte/lt/lte) on Text properties were being extracted/pushed down into backend queries, leading to incorrect behavior (and the exception reported in #2935). The change keeps these text range predicates in the traversal layer while adding a regression test to ensure direct traversals behave consistently with equivalent match() traversals.

Changes:

Prevent extraction of HasStep containers when they include range comparisons on Text properties.
Keep existing extraction behavior for system properties and non-text property predicates (intended).
Add a regression test reproducing the traversal shape from #2935 and asserting consistent results between direct and match() forms.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/traversal/optimize/TraversalUtil.java	Adds a guard to stop extracting `HasContainer`s with text range predicates into backend query steps.
hugegraph-server/hugegraph-test/src/main/java/org/apache/hugegraph/core/CountStrategyCoreTest.java	Adds a regression test covering a text range filter followed by `repeat(...).emit().times(1).count()`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…fix/fix-text-range-filter # Conflicts: # hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/traversal/optimize/TraversalUtil.java

codecov · 2026-05-21T06:59:50Z

Codecov Report

❌ Patch coverage is 88.23529% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 31.36%. Comparing base (454dd3d) to head (0b45ebb).

Files with missing lines	Patch %	Lines
...he/hugegraph/traversal/optimize/TraversalUtil.java	88.23%	3 Missing and 3 partials ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master    #3034      +/-   ##
============================================
- Coverage     35.94%   31.36%   -4.58%     
- Complexity      338      500     +162     
============================================
  Files           803      814      +11     
  Lines         68053    69240    +1187     
  Branches       8907     9113     +206     
============================================
- Hits          24460    21720    -2740     
- Misses        40968    45118    +4150     
+ Partials       2625     2402     -223

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

imbajin

The original correctness fix is clear, but the latest version now broadens the PR into a more aggressive HasStep split-and-remove optimization. I left one scope/risk comment suggesting we keep this PR on the minimal safe path and move the broader optimization to a follow-up.

imbajin · 2026-05-21T13:17:03Z

+            if (!GraphStep.processHasContainerIds(newStep, has)) {
+                newStep.addHasContainer(has);
+            }
+            holder.removeHasContainer(has);


‼️ I think we should scope this PR back down before merging. The original bug is a focused correctness issue: text range predicates such as has("vp4", P.lt("")) should not be pushed into the backend query path. The latest implementation now goes further and splits a mixed HasStep, pushes selected containers down, removes them from the original HasStep, and adds new label/index-availability logic in TraversalUtil.

Original bug boundary Text range predicate | v wrongly pushed to backend | v exception / mismatch with match()

Latest implementation boundary HasStep(vp4 < "", age = 1) | +-- push age = 1 into HugeGraphStep / HugeVertexStep | +-- remove age = 1 from the original HasStep | +-- rely on new label/index checks to prove backend can enforce it

That is a much broader optimizer change than the original fix, and once a container is removed from the original HasStep, correctness depends on canExtractHasContainer() exactly matching what the backend planner can really execute. The new tests cover several cases, but this still expands the review surface into composite-index matching, label context, paging/sysprop interactions, and future index-planner behavior.

For this PR, I would prefer one of these smaller and safer shapes:

Option 1: minimal correctness fix - keep unsafe Text + range predicates in the traversal layer - do not split mixed HasSteps - avoid adding backend index-availability logic here Option 2: low-risk partial prefiltering - optionally push safe containers down as redundant backend pre-filters - keep the original HasStep unchanged unless all containers are extractable - correctness remains guarded by the full traversal-layer filter

Safer partial prefiltering shape Backend prefilter: age = 1 Traversal filter still intact: vp4 < "" AND age = 1 Result: backend may reduce candidates, but traversal still owns correctness

The current split-and-remove optimization can still be valuable, but I think it should be a separate follow-up PR with dedicated tests for composite-index prefix matching, label-less queries, ~page / paging semantics when some user filters remain in traversal, and parity with the backend index planner. Keeping this PR minimal would make the #2935 fix easier to reason about and safer to merge.

Okay, sorry. I was too obsessed with fixing the review issue of copilot earlier, which led to a slight deviation in direction. I'll adjust my focus now.

Thanks for adjusting the direction. To clarify my previous comment: I do not mean that we should choose freely between the two shapes. For this PR, I would prefer the minimal option A.

The original issue is small and focused: Text range predicates should not be extracted into the backend query path. So the safest fix here is to add the narrow condition that blocks only that unsafe case, while keeping the existing extraction behavior unchanged for the rest.

Preferred shape for this PR HasStep(vp4 < "", age = 1) | v if any extracted container is Text + range: keep this HasStep in traversal layer avoid backend text-range extraction No new index-planner logic No mixed HasStep split/remove behavior No broader optimizer contract change

This may leave a small performance loss for the uncommon case where one has() step mixes a text-range predicate with other safe predicates, but it keeps the patch proportional to the bug and avoids introducing new correctness risks around label context, index matching, composite index prefixes, paging, or unique/search/numeric index behavior.

PS: the more aggressive split-and-remove optimization can still be considered later

Thanks for the clarification. I agree with option A for this PR.

I have scoped the local changes back to the minimal correctness fix:

no mixed HasStep split/remove behavior

no new index-planner or label/index availability logic

if a HasStep contains a Text range predicate (gt/gte/lt/lte), keep the whole HasStep in the traversal layer

keep the existing extraction behavior for other cases as much as possible

This intentionally accepts the small potential performance loss for mixed HasSteps, so the fix stays proportional to #2935. I’ll push the narrowed patch after one final local check.

Now this fix adopts a conservative strategy. When encountering the same HasStep mixed with text range and other security conditions, the entire HasStep will be retained, possibly with less backend push-down optimization

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

fix(server): avoid extracting text range filters

e6ff06e

dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. bug Something isn't working labels May 21, 2026

imbajin requested a review from Copilot May 21, 2026 06:36

Copilot started reviewing on behalf of imbajin May 21, 2026 06:36 View session

Copilot AI reviewed May 21, 2026

View reviewed changes

Comment thread ...rver/hugegraph-core/src/main/java/org/apache/hugegraph/traversal/optimize/TraversalUtil.java Outdated

Comment thread ...rver/hugegraph-core/src/main/java/org/apache/hugegraph/traversal/optimize/TraversalUtil.java

LegendPei added 2 commits May 21, 2026 14:56

fix(server): avoid extracting text range filters

1446b2b

Merge remote-tracking branch 'origin/fix/fix-text-range-filter' into …

4dacdec

…fix/fix-text-range-filter # Conflicts: # hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/traversal/optimize/TraversalUtil.java

test(server): cover text range filter extraction

a67a886

dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels May 21, 2026

LegendPei requested a review from Copilot May 21, 2026 07:36

Copilot AI reviewed May 21, 2026

View reviewed changes

fix(server): address text filter review comments

1d84368

LegendPei requested a review from Copilot May 21, 2026 11:03

Copilot AI reviewed May 21, 2026

View reviewed changes

fix(server): keep has extraction traversal stable

b2b784e

LegendPei requested a review from Copilot May 21, 2026 12:05

Copilot AI reviewed May 21, 2026

View reviewed changes

Comment thread ...rver/hugegraph-core/src/main/java/org/apache/hugegraph/traversal/optimize/TraversalUtil.java Outdated

Comment thread ...rver/hugegraph-core/src/main/java/org/apache/hugegraph/traversal/optimize/TraversalUtil.java

fix(server): extract safe has containers individually

c8b4bf0

LegendPei requested a review from Copilot May 21, 2026 12:31

Copilot AI reviewed May 21, 2026

View reviewed changes

LegendPei added 2 commits May 21, 2026 20:50

fix(server): scope has extraction to label indexes

b2ff85b

fix(server): align has extraction with index matching

ddf4671

LegendPei requested a review from Copilot May 21, 2026 13:15

imbajin reviewed May 21, 2026

View reviewed changes

Copilot AI reviewed May 21, 2026

View reviewed changes

LegendPei added 2 commits May 21, 2026 21:55

fix(server): narrow text range filter extraction

0b45ebb

test(server): cover text range has extraction paths

a361614

Conversation

LegendPei commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose of the PR

Main Changes

Verifying these changes

Does this PR potentially affect the following parts?

Documentation Status

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

imbajin left a comment

Choose a reason for hiding this comment

Uh oh!

imbajin May 21, 2026

Choose a reason for hiding this comment

Uh oh!

LegendPei May 21, 2026

Choose a reason for hiding this comment

Uh oh!

imbajin May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LegendPei May 21, 2026

Choose a reason for hiding this comment

Uh oh!

LegendPei May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

LegendPei commented May 21, 2026 •

edited

Loading

codecov Bot commented May 21, 2026 •

edited

Loading

imbajin May 21, 2026 •

edited

Loading